SentTopic-MultiRank: a Novel Ranking Model for Multi-Document Summarization
نویسندگان
چکیده
Extractive multi-document summarization is mostly treated as a sentence ranking problem. Existing graph-based ranking methods for key-sentence extraction usually attempt to compute a global importance score for each sentence under a single relation. Motivated by the fact that both documents and sentences can be presented by a mixture of semantic topics detected by Latent Dirichlet Allocation (LDA), we propose SentTopic-MultiRank, a novel ranking model for multi-document summarization. It assumes various topics to be heterogeneous relations, then treats sentence connections in multiple topics as a heterogeneous network, where sentences and topics/relations are effectively linked together. Next, the iterative algorithm of MultiRank is carried out to determine the importance of sentences and topics simultaneously. Experimental results demonstrate the effectiveness of our model in promoting the performance of both generic and query-biased multi-document summarization tasks.
منابع مشابه
WordTopic-MultiRank: A New Method for Automatic Keyphrase Extraction
Automatic keyphrase extraction aims to pick out a set of terms as a representation of a document without manual assignment efforts. Supervised and unsupervised graph-based ranking methods have been studied for this task. However, previous methods usually computed importance scores of words under the assumption of single relation between words. In this work, we propose WordTopic-MultiRank as a n...
متن کاملMulti-layered graph-based multi-document summarization model
Multi-document summarization is a process of automatic generation of a compressed version of the given collection of documents. Recently, the graph-based models and ranking algorithms have been actively investigated by the extractive document summarization community. While most work to date focuses on homogeneous connecteness of sentences and heterogeneous connecteness of documents and sentence...
متن کاملAn Exploration of Document Impact on Graph-Based Multi-Document Summarization
The graph-based ranking algorithm has been recently exploited for multi-document summarization by making only use of the sentence-to-sentence relationships in the documents, under the assumption that all the sentences are indistinguishable. However, given a document set to be summarized, different documents are usually not equally important, and moreover, different sentences in a specific docum...
متن کاملDecayed DivRank for Guided Summarization
Guided summarization is essentially an aspect-based multi-document summarization, where aspects can be taken as specified queries in summarization. We proposed a novel ranking algorithm, Decayed DivRank (DDRank) for guided summarization tasks of TAC2011. DDRank can address relevance, importance, diversity, and novelty simultaneously through a decayed vertex-reinforced random walk process in sen...
متن کاملMulti-Document Summarization via Discriminative Summary Reranking
Existing multi-document summarization systems usually rely on a specific summarization model (i.e., a summarization method with a specific parameter setting) to extract summaries for different document sets with different topics. However, according to our quantitative analysis, none of the existing summarization models can always produce high-quality summaries for different document sets, and e...
متن کامل